-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
export model to fp16 #347
base: master
Are you sure you want to change the base?
export model to fp16 #347
Conversation
Ok. I see you went for a much deeper change. Did you manage to test it? |
It is not tested. I am trying to implement the load of model (version 0 and maybe 1) |
Question: what is the benefit of fp16?
Am I missing some considerations? |
b = struct.pack(f'{len(d)}b', *d) | ||
def serialize(file, tensor, type): | ||
""" writes one tensor to file that is open in wb mode """ | ||
if type == 'fp32': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels simplfiable
@@ -129,7 +133,7 @@ def legacy_export(model, filepath): | |||
# ----------------------------------------------------------------------------- | |||
# new version | |||
|
|||
def version1_export(model, filepath): | |||
def version1_export(model, filepath, type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd have to serialize the type to the header too
The point is that they can be directly loaded into the GPU. Also, I tested llama2.c on an ARM machine using their native support for fp16 and it works like a charm (and ARM CPU are cheaper on AWS). |
@@ -450,6 +454,7 @@ def torchscript_export(model, filepath, zero_params=False, gzip_output=False): | |||
parser = argparse.ArgumentParser() | |||
parser.add_argument("filepath", type=str, help="the output filepath") | |||
parser.add_argument("--version", default=0, type=int, help="the version to export with") | |||
parser.add_argument("--type", default='fp32', type=str, help="the data type to export to (fp32, fp16, bfloat16)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not 100% decided if type should be a separate variable that is written into the header, or if it should just be absorbed into version. E.g.:
version 0 original float32
version 1 original float16
version 2 new header int8
etc and just go that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another PR that just uses "--version" for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget the support to bf16
(and maybe others to come)
If using a version number for each, it is both not intuitive as also it will have a lot of "versions"
No description provided.